Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up QK and file and tensor types #678

Closed
wants to merge 2 commits into from
Closed

Conversation

sw
Copy link
Collaborator

@sw sw commented Apr 1, 2023

(edit: the python changes will clash with #545)

This PR has several goals:

  • reduce the number of integer literals strewn across the code base
  • better ensure array and enum consistency
  • clear up possible confusion about file and tensor types
  • prepare the way for new quantization formats with QK != 32, as discussed in 2-bit integer quantization  #456
  • deduplicate python definitions

For the python scripts, I introduce a new file ggml.py at the top level, which contains definitions of the file and tensor types equivalent to those in ggml.h. I have formatted that with black, in case #611 returns from the dead.

The changes to the python files on one hand and C/C++ on the other are technically independent, but the discussion will overlap, so I'm keeping this in one draft. I will split it later if it makes sense.

I have tested conversion from pth to ggml with identical outputs, but I have not tested the other conversion scripts.

Open questions:

  • should enum e_ftype be moved to llama.h (with sensible renaming)? This would allow us to eliminate the hard-coded 2,3 in the usage string of quantize.cpp, and maybe be useful elsewhere.
  • how is file type 4 used? I could not find any other parts of the code that would use this. I see now, it's for GPTQ models. Should I add FTYPE_GPTQ?.
    case 4: wtype = GGML_TYPE_Q4_1; vtype = GGML_TYPE_F16; break;
  • Is GGML_FILE ok or should it be LLAMA_FILE? ggml.c doesn't deal with that type.

Some more comments below, looking for your thoughts on this...

FTYPE_Q4_0 = 2,
FTYPE_Q4_1 = 3,
};
static const char * ftype_str[] = { "f32", "f16", "q4_0", "q4_1" };
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, pedantic ISO C++ does not allow designated initializers.

@@ -100,7 +109,7 @@ struct llama_hparams {
int32_t n_head = 32;
int32_t n_layer = 32;
int32_t n_rot = 64;
int32_t f16 = 1;
int32_t f16 = FTYPE_F16;
Copy link
Collaborator Author

@sw sw Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is the correct semantics - what's the intention behind ftype and f16?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f16 is badly named since the times I was only considering F16 and F32 and no other data types

@@ -222,7 +179,7 @@ def copy_tensors(fin, fout, part_id, n_parts):

# ensure tensor data is aligned
tensor_data_offset = fout.tell()
while tensor_data_offset % QK != 0:
while tensor_data_offset % 32 != 0:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llama_model_load and llama_model_quantize_internal have 32 hard-coded as well. What matters is not the quantized block size, but whether the processor can efficiently load and store from a multiple of this number, especially with SIMD instructions.

@prusnak
Copy link
Collaborator

prusnak commented Apr 1, 2023

I think it would be worthwhile to separate the more straightforward Python changes into a separate PR which can be reviewed and merged sooner than the more complex changes in C.

@sw
Copy link
Collaborator Author

sw commented Apr 1, 2023

I think it would be worthwhile to separate the more straightforward Python changes into a separate PR which can be reviewed and merged sooner than the more complex changes in C.

Agree, but I would like the Python definitions to be similar to C/C++

Python GGML_TYPE.F32 <-> C GGML_TYPE_F32
Python GGML_FILE.Q4_1 <-> C++ FTYPE_Q4_1 (this should match better)

So I'm looking for opinions, once we have general consensus on whether and how this should be done, I will split the PR.

@@ -32,6 +33,7 @@ def write_header(f_out, header):

if magic != 0x67676d6c:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the magic a constant too?

Copy link
Collaborator Author

@sw sw Apr 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, though that's the old one before mmap. Someone ought to migrate the *-to-ggml.py scripts. (edit: #704 #545)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need old magic constants anyway to detect older models.

@sw sw mentioned this pull request Apr 2, 2023
16 tasks
Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These refactors and code maintenance changes are very helpful

  • enum e_ftype can be moved to llama.h and be called enum llama_ftype
  • I think GGML_FILE -> LLAMA_FTYPE

@sw
Copy link
Collaborator Author

sw commented Apr 2, 2023

  • enum e_ftype can be moved to llama.h and be called enum llama_ftype
  • I think GGML_FILE -> LLAMA_FTYPE

@ggerganov I'm glad you like the overall direction, I'll make a separate PR with this enum, then we might sync it with the Python code in #545 before that gets merged.

@sw sw mentioned this pull request Apr 2, 2023
@sw
Copy link
Collaborator Author

sw commented Apr 15, 2023

Obsolete thanks to #709 #1001 #545

@sw sw closed this Apr 15, 2023
@sw sw deleted the qk-ftypes branch April 22, 2023 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants